Using AutoMed for Data Warehousing

نویسنده

  • Hao Fan
چکیده

A data warehouse consists of a set of materialized views defined over a number of data source, collects copies of data from remote, distributed, autonomous and heterogeneous data sources into a central repository to enable analysis and mining of the integrated information. Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commercial products and services relating to data warehousing are now available, and all of the principal data management system vendors, such as Oracle, IBM Informix, and MS SQL Server, have offerings in these areas. Research problems in data warehousing include data warehouse architecture design, information quality and data cleansing, maintaining data warehouses, selecting views to materialize, Workflow data management [2], data lineage tracing in data warehouses, and so on. A good overview of data warehousing and OLAP technology is addressed in [5, 48]. Now, more and more data warehouses integrate data from multiple and autonomous data sources. Extending existing warehouse activities into heterogeneous database environment is a new challenge in data warehousing research. AutoMed is a data transformation and integration system, supporting both virtual and materialized integration of schemas expressed in a variety of modelling languages. This system is being developed in a collaborative EPSRC-funded project between Birkbeck and Imperial Colleges, London — see http://www.ic.ac.uk/automed Common to many methods for integrating heterogeneous data sources is the requirement for logical integration [26] of the data, due to variations in the design of data models for the same universe of discourse. A common approach is to define a single integrated schema expressed using a common data model. Using a high-level modelling language (e.g. ER, OO or relational) as the common data model can be complicated because the original and transformed schemas may be represented in different high-level modelling languages and there may not be a simple semantic correspondence between their modelling constructs. In previous work within the AutoMed project [43, 37, 38], a general framework has been developed to support schema transformation and integration. This framework provides a lower-level hypergraph based data model (HDM) as the common data model and a set of primitive schema transformations for schemas defined in this data model. One advantage of using a low-level data model such as the HDM is that semantic mismatches between modelling constructs are avoided. Another advantage is that it provides an unifying semantics for higher-level modelling constructs. In particular, [38] shows how ER, relational and UML data models, and the set of primitive schema transformations on each of them, can be defined in terms of the lower-level HDM and its set of primitive schema transformations. That paper also discusses how inter-model transformations are possible within the AutoMed framework, thus allowing a schema expressed in one high-level modelling language to be incrementally transformed into a schema expressed in a different high-level modelling language. The approach was extended to also encompass XML data sources in [39], and ongoing work [3] in AutoMed is also extending its scope to encompass formatted data files, plain text files, and other semi-structured models, YATTA [32] and RDF [49]. In this report, I discuss how the AutoMed approach can be used for data warehousing processes, especially for Data Lineage Tracing (DLT) in a heterogeneous data warehousing environment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating a heterogeneous data integration approach for data warehousing

Data warehouses integrate data from remote, heterogeneous, autonomous data sources into a materialised central database. The heterogeneity of these data sources has two aspects, data expressed in different data models, called model heterogeneity, and data expressed within different schemas of the same data model, called schema heterogeneity. AutoMed is an approach to heterogeneous data transfor...

متن کامل

Schema Evolution in Data Warehousing Environments - A Schema Transformation-Based Approach

In heterogeneous data warehousing environments, autonomous data sources are integrated into a materialised integrated database. The schemas of the data sources and the integrated database may be expressed in different modelling languages. It is possible for either the data source schemas or the warehouse schema to evolve. This evolution may include evolution of the schema, or evolution of the m...

متن کامل

Using AutoMed for XML data transformation and integration

This paper describes how the AutoMed data integration system is being extended to support the integration of heterogeneous XML documents. So far, the contributions of this research have been the development of two algorithms. One restructures the schema describing an XML document into another schema, and the other materialises an integrated schema resulting from the transformation of several so...

متن کامل

Knowledge Transformation using a Hypergraph Data Model

In the Semantic Web, knowledge integration is frequently performed between heterogeneous knowledge bases. Such knowledge integration often requires the schema expressed in one knowledge modelling language be translated into an equivalent schema in another knowledge modelling language. This paper defines how schemas expressed in OWL-DL (the Web Ontology Language using Description Logic) can be t...

متن کامل

View Generation and Optimisation in the AutoMed Data Integration Framework

This paper describes view generation and view optimisation in the AutoMed heterogeneous data integration framework. In AutoMed, schema integration is based on the use of reversible schema transformation sequences. We show how views can be generated from such sequences, for global-as-view (GAV), local-as-view (LAV) and GLAV query processing. We also present techniques for optimising these genera...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003